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• ' Classic decision-theory is based on the maxinLum expected utility (MEU) principle, but 

Q \ crucially ignores the resource costs incurred when determining optimal decisions. Here we 

propose an axiomatic framework for bounded decision-making that considers resource costs. 
Agents are formalized as probability measures over input-output streams. We postulate 
^ ■ that any such probability measure can be assigned a corresponding conjugate utility 

^^ \ function based on three axioms: utilities should be real-valued, additive and monotonic 

^ ' mappings of probabilities. We show that these axioms enforce a unique conversion law 

between utility and probability (and thereby, information). Moreover, we show that 
this relation can be characterized as a variational principle: given a utility function, its 

r_^ \ conjugate probability measure maximizes a free utility functional. Transformations of 

probability measures can then be formalized as a change in free utility due to the addition 
of new constraints expressed by a target utility function. Accordingly, one obtains a 
criterion to choose a probability measure that trades off the maximization of a target utility 
function and the cost of the deviation from a reference distribution. We show that optimal 

j^ ■ control, adaptive estimation and adaptive control problems can be solved this way in a 



o 



o 
o 
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JH \ resource-efficient way. When resource costs are ignored, the MEU principle is recovered. 

Our formalization might thus provide a principled approach to bounded rationality that 
establishes a close link to information theory. 

1. Introduction 

Rationa l decision-making is based on the principle of (subjective) maximum expected utility 



(MEU ) (jvon Neumann and Morgensternl . ll944l : ISavagd . ll954l : lAnsconibe. F. J. and Aumann. R. J. 



19631 ). According to the MEU principle, a rational agent chooses its action a so as to 



maximize its expected utility 

E[U|a] =^Pr(s|a)U(s) 

s 

given the probability Pr(s|a) that action a € A will lead to outcome s € 5 and given that 
the desirability of the outcome s is measured by the utility U(s) G M. Thus, expected 
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utilities express betting preferences over lotteries with uncertain outcomes. The optimal 
action a* € A is defined as the one that maximizes the expected utility, that is 

o* := argmaxE[U|a]. 

a 

What is not apparent from this simple formula, however, is that finding the optimal action 
can be very difficult, especially for decision- making problems in uncertain environments 
with very large space of outcomes S. One could easily imagine that computing the optimal 
answer is so costly (in terms of computational resources), that one would rather content 
oneself with a slightly "sub-optimal" solution that incurs into less resource costs. The 
problem is, however, that the MEU principle as stated above does not formally consider 
resource costs, and hence the problem of limited resources is ignored. Attempts to take 
resource costs into account for effi cient decision -making have led to the important concept 
of (resource-)bounded rationality ( Simonl . ll982l ). 



In this paper we propose an axiomatic formalization of bounded rationality that 
interprets a decision-maker's behavior (characterized by a probability measure) as an 
implicit manifestation of his preferences. We postulate three axioms that lead to a 
quantitative conversion between utilities and probabilities (and ultimately, information), 
which establishes a duality between the probability- and utility-representation of a decision- 
maker. We show that the link between these representations can be characterized by a 
variational principle, which allows interpreting the probability measure as the equilibrium 
distribution over a constraint landscape determined by the utility function. Based on this 
interpretation, we then formalize the problem of maximizing the expectation of a target 
utility function as a transformation of an initial probability measure (encoding the prior 
behavior of the decision-maker) into a final probability measure that considers both the 
deviation from the initial probability measure and the new constraint given by the target 
utility function. We show how this leads to a principled way to choose a probability measure 
that optimally trades off the benefits of maximizing the target utilities against the costs of 
transforming the probability measure. We apply this formalism to stochastic systems that 
process an input-output (I/O) stream in a sequential fashion and construct a generalized 
variational principle for this setup. Finally, we show how to apply this generalized principle 
to derive solutions to the problems of optimal control, adaptive estimation and adaptive 
control. 

2. Conversion between probability and utility 

2.1 Preliminaries and notation 

We introduce the following notation. A set is denoted by a calligraphic letter like X and 
consists of elements or symbols. Strings are finite concatenations of symbols. The 
empty string is denoted by e. Af*^ denotes the set of strings of length n based on X. For 
substrings, the following shorthand notation is used: a string that runs from index i to k 
is written as Xi-k := XjXj+i . . . Xk-iXk- Similarly, x<j := xij;2 . . . a^j is a string starting from 
the first index. By convention, Xi:j := e if i > j . Logarithms are always taken with respect 
to base 2, thus log(2) = 1. The symbol ^{X) denotes the powerset of X^ i.e. the set of all 
subsets of X. 
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To simplify the exposition, all probability spaces are assumed to be finite. Due to this, 
we clarify some terminology. A probability space is a triple {fl, J-, P) where ri is the 
sample space, J- := ^{Q) is the u-algebra of events, and P is the probability measure 
over T. A sample or outcome is an element cj € fi. An event is a member of J- and 
hence a finite set of outcomes. An atom is a singleton {uj} € J^. A random variable is a 
function X : ^ ^>- X mapping each outcome uj into a symbol X{lo) from a finite alphabet 
X. The probability of the random variable X taking on the value x G Af is defined as 
P(x) := P(X = x) := P{{u G O : X{u) = x}). 

2.2 Utility 

Consider a stochastic system whose behavior is represented by a probability space (Q, J-, P). 
The probability measure P fully characterizes the generative law of the potential events that 
the system can obtain. Thus, if P(^) > P(-B), then the propensity of A is higher than that 
of B. This difference in probability can be given a teleological interpretation: A is more 
probable than B because A is more desirable than B. For reasons that will become apparent, 
a measure that quantifies such differences in desirability is called a utility function. If there 
is such a measure, then it is reasonable to demand the following three properties: 

i. Utilities should be mappings from conditional events into real numbers. 

ii. Utilities should be additive up to an arbitrary translation constant!^. 

iii. A more probable event should have a higher utility than a less probable event. 

The three properties can then be summarized as follows. 

Definition 1. Let ((7, J-", P) be a probability space. A function U is a utility function for 

P iff it has the following three properties for all events A,B,C,D G J-" and some constant 

/3gM: 

i. V{A\B) € M, (real-valued) 

ii. lJ{AnB\C) = lJ{A\C) + \J{B\AnC)-/3, (additive) 

ifi. PiA\B) > P{C\D) ^ U(A|S) > V{C\D). (monotonic) 

Furthermore, we use the abbreviation V{A) := U(A|il) for "unconditional" events. 
From property (ii) it is seen that the translation U'(-) = U(-) — f3 leads to a strict additivity 
of U': 

U{A r\B) = V{A) + IJ{B\A) - 13, 
(U'{A n S) + /3) = (U'(^) + /3) + (U'{B\A) +13) -(3, 
JJ'iA nB) = V'{A) + IJ'{B\A). 



That is, the utihty of a joint event should be obtained by summing up the utihties of the sub-events 
(up to an arbitrary translation constant). The translation constant accounts for the fact that absolute 
values of utilities are not meaningful: only differences between utilities matter. For example, the "utility 
of drinking coffee and eating a croissant" should equal "the utility of drinking coffee" plus the "utility 
of having a croissant given the reward of drinking coffee" minus a translation constant. 
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The following theorem shows that these three properties enforce a strict mapping between 
probabilities and utilities. 

Theorem 2. // / is such that \J{A\B) = f(P{A\B)) for any probability space {^,T,P), 
then f is of the form 

/(•) = alog(-)+/3, 

where a > is arbitrary strictly positive constant and f3 is an arbitrary constant. 

Proof Let / be such that f(P{C\D)) = IJ{C\D) for aU C,DeJ^. Let Ai,A2, ...,An£T 
be a sequence of events such that P{Ai) = P{Ai\ Clj^^ Aj) > for alH = 1, . . . , n. Applying 
/ yields the equivalence 



P(^i) = Pi^Ai\(~]Ajj ^^ U(^i) = u(A|n ^i 

j<i j<i 

for all i = 1, . . . ,n. Using the previous properties, the product rule for probabilities and 
the additivity property for utilities, one can show 

(n ^ 



Y.{^{a\{\A^ -^= n(U(^i) - ^) = n(/(P(^i 



))-/3). 



Since P(^i) is arbitrary, this means that 

/(p") = n(/(p)-/3) 

for arbitrary p G (0, 1] and n G N. 

The rest of the argument parallels Shannon's entropy theorem ( Shannon! . 1 19481 ). Let 
p,q & (0, 1] such that q < p. Choose an arbitrarily large ttt, G N and find an n G N to satisfy 
^m < p"- .^ 5™+^. Taking the logarithm, and dividing by nlogg one obtains 

m logp ml 

-<^<- + -. (1) 

n logg n n 

Similarly, using f{p^) = n{f{p) — (3) and the monotonicity of /, we have 

^^ m(/(g)-/3) < n(/(p)-/3) < (m + l)(/(g) - /3). 

Dividing the last set of inequalities by n{f{p) — /3) yields 

"1 f(p) — 0ml 
n f{Q) — P n n 



Combining the inequalities in ([T]) and ([2]), one gets 

logp f{p) - /3 



logq f{q)-0 
4 



2 
< -. 

n 
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Since m, n can be chosen arbitrary large, this imphes 

logP ^ f{p) - P 
logq fiq)-l3 

in the hmit n ^- oo. Fixing q and rearranging terms gives the functional form 

f{p) = alogp + P, 
where a must be positive to satisfy the monotonicity property. D 

Thus, Theorem [2] establishes the relation 

IJ{A\B) =alogP(A|5) + /3, 

and in particular, 

U(fi) = p. 

In general, if a probability measure P and a utility function U satisfy this relation, then we 
say that they are conjugate. Given that this transformation is a bijection, one has that 

PiA\B) = exp{^(lJiA\B)-\Jin))}. 

There are two important observations with respect to this particular functional form. First, 
note that h(A|i?) := — logP(^|i3) is just the Shannon information content of A given B. 
Therefore, 

U{A\B) = -ah{A\B)+^. 

Second, this transformation implies that the probability measure P is the Gibbs measure 
with temperature a and energy levels e(u;) := — U({a;}), i.e. the measure given by 

for all A G J^. In statistical mechanics, the Gibbs measure is the equilibrium distribution for 
a given energy landscape. For this reason, we call a > the temperature. The definition 
of utility extends to random variables in the natural way. Thus, given a random variable 
X with values in X, the utility of x € A' is given by U(x) = alogP(x) + /3. 

2.3 Variational principle 

The conversion between probability and utility established in Theorem [2] satisfies a 
variational principle. 

Theorem 3. Let X be a random variable with values in X . Let P and U 6e a conjugate 
pair of probability measure and utility function over X. Define the free utility functional 
as 

J(Pr; U) := ^ Pr(2;)U(x) - a ^ Pr(x) log Pr(x), 
xex x£X 

where FV is an arbitrary probability measure over X. Then, 

J(Pr;U) < J(P;U) = U(0). 
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Proof. A similar proof to the present one is given in lKellerl (1993, Theorem 1.1.3). Rewriting 
terms using the utihty-probabihty conversion and applying Jensen's inequality yields 

J(Pr; U) = ^ Pr(a;)U(x) - a ^ Pr(3;) log Pr(x) 



xex 



xex 



\-j^( M exp(iU(x)) 
a_^Pr(x)log^ 



xex 



< a log \] f^( 



exp(iU(x)) 



R-(x) 
alog^ ^P(x)exp(U(r?)) 

xi 

U(Si), 



E 

xex 



exp( — U(a;)) 

With equality lit — pg. s — is constant, i.e. it Ft = P. 



D 



The free utilitjo is the expected utility of the system plus the uncertainty over the 
outcome. The variational principle tells us that the probability law P of the system is the 
one that maximizes the free utility for a given utility function U, since 

P = argmax J(Pr; U). 

Pr 

Here the utility function U plays the role of a constraint landscape for the probability 
measure P. As the temperature a approaches zero, the probability measure P(x) 
approaches a delta function Sx*{x), where x* = argmax^U(x). Similarly, as a — > oo, 
P{x) — > -r^, i.e. the uniform distribution over X. Hence, the temperature a plays the role 
of the conversion factor between resources and utilities. 




-U, 




Figure 1: A transformation from a system (Pj,Uj) into a system (Pj,Uj) by addition of 
a constraint U*. 



The variational principle allows conceptualizing transformations of stochastic systems 
(Figure [T]) . Consider an initial system having probability measure P j and utility function 
Uj. This system satisfies the equation 

3i := Y, Pi{^)Vi{x) - a ^ Pi{x) logPi(a;) = Ui(J7). 



x£X 



xex 



2. The functional F := —J is also known as the Helmholtz free energy in thermodynamics. F is a measure 
of the "useful" work obtainable from a closed thermodynamic system at a constant temperature and 
volume. 
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We add new constraints represented by the utility function U*. Then, the resulting utility 
function U/ is given by the sum 

and the resulting probability measure Pj maximizes 

J(Pr, U/) = Y^ Pr(x)U/(x) - a J]] Pr(x) log Pr(x) 

= ^ Pr{x){Vi{x) + U*(x)) - a ^ Pr(x) log Pt-(x) 
xex xex 

= Y, Pr{x)lJ,{x) - a ^ R.(:E)log^ + U,(f^). 
xex x€X ^'^^> 

Let 3f := J(P/,U/). The difference in free utility is 

J/ - J, = ^ P/(x)U,(x) -aY, P/(^) log ^4t- (3) 

x&X x&X *^^^^> 

The difference in free utility has an interpretation that is crucial for the formalization of 
bounded rationality: it is the expected target utility U* (first term) penalized by the cost 
of transforming Pj into P/ (second term). Clearly, ([3]) is a functional to be maximized. 
Depending on the givens and the unknowns, this leads to different variational problems. 
We emphasize the two cases that are important for our exposition: 

1. Control. If we fix the initial probability measure Pj and the constraint utilities U*, 
then the final system Pj optimizes the trade-off between utility and resource costs. 
That is, 

Pf = argmax > Pr(x)U=i,(a;) — a > Pr(x) log . (4) 



The solution is given by 



x&X x^X 



P/(x) (xPi(a;)exp(-U*(x)y 



In particular, at very low temperature a ~ 0, ([3]) becomes 

x<^X 

and hence resource costs are ignored in the choice of P/, leading to P/ f« bx*{x\ 
where x* = max^,. U* ( a;) . Similarly, at a high temperature, the difference is 

J,-J.«-a^P,(x)log^, 

x£X *^ ^ 

and hence only resource costs matter, leading to Pj w Pj. 
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2. Estimation. If we fix tlie final probability measure Pj and the constraint utilities U,= , 
tlien the initial system Pj satisfies 

Pi = argniax V] P/(x)U*(x) - a V] P/(2;) log _/, , (5) 

E ^ fix) 

and thus we have recovered the minimum relative entropy principle for estimation, 
having the solution 



Varying the initial distribution Pj is equivalent to varying the utility Uj as part of 
U/ such that the given distribution Pj becomes the equilibrium distribution. 

Alternatively, one can regard control as the problem of finding Pj given U* and Uj; and 
estimation as the problem of finding Uj given Pj and U^,. This is easily seen after rewriting 
the terms in ([3]). 

3. I/O systems 

We now turn our discussion to I/O systems. Informally, I/O systems model anything that 
has an I/O stream,^ like a calculator, a human cell, an animal, a computer program or a 
robot. In this sense, an I/O system is not required to be a discretely identifiable (physical) 
entity as long as there is a viewpoint from which it appears to have an I/O stream. For 
example, from a robot's perspective, its environment is a well-defined system too because it 
has an "input channel" to absorb the robot's actions and an "output channel" to produce 
the robot's perceptions. 

The mathematical description of an I/O system can be done at several levels. This paper 
focusses on two of them: behavior and beliefs. A model of behavior is a direct specification 
of an I/O system that merely describes the statistics of the I/O stream. A model of beliefs 
is an indirect specification of an I/O system that has the advantage of representing the I/O 
system's underlying assumptions that give rise to its behavior. 

3.1 Model of behavior 

Formally, an I/O system is an abstract model of a (stochastic) machine that processes 
input symbols and generates output symbols. These symbols are exchanged with another 
(external) I/O system via an I/O channel (Figure [2]). 

The interaction between two I/O systems proceeds in cycles t = 1,2, ... ,T following a 
predefined protocol. The protocol determines which system is responsible for each cycle. In 
cycle t, the responsible system generates a symbol xt conditioned on the past symbols x<i. 
Then the cycle t + 1 starts. 

If one wants to characterize the way an I/O system behaves, it is necessary to specify 
the statistics governing its potential I/O stream. One can encapsulate all the details by 
providing the probability distribution over the potential I/O sequences. 
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Figure 2: Two I/O systems P and Q interacting with each other. 

Definition 4. An I/O system is a probabihty measure P over T random variables 
Xi, X2, . . . , Xx taking on values in finite alphabets Afi, ^"2, . . . , Xx- 

Because the I/O system processes both input and output symbols, the probability mea- 
sure P contains both evidential and generative probabilities. The evidential probabilities, 
called plausibilities, allow the I/O system to infer properties about its input stream; 
while the generative probabilities, called propensities, prescribe the law to generate its 
output stream. Hence, if xt is generated by an external I/O system, then P(xj|x<t) is the 
plausibility of observing xt given the past I/O string x<t; while if xt is generated by the 
I/O system P itself, then P(xt|x<t) is the propensity of producing xt given the past I/O 
string x<t. 

3.2 Model of beliefs 

While the previous definition contains all the necessary details to describe the behavior of 
an I/O system, it falls short modeling the I/O system's underlying assumptions that bring 
about its behavior. Importantly, it is desirable to model the uncertainties two interacting 
I/O systems have about each other, because these uncertainties play a fundamental role in 
conceptualizing adaptive behavior. The aim of this section is to introduce a model for I/O 
systems that allows explicitly representing these uncertainties. 



3.2.1 Causal Models 

From the point of view of an I/O system P that is interacting with an I/O system Q, 
one needs to represent (a) the uncertainty P has about Q and (b) the uncertainty Q 
has about P. Following a Bayesian approach, both uncertainties are modeled by the 
introduction of hidden/undisclosed variables. More specifically, cases (a) and (b) can be 
modeled by undisclosed inputs and undisclosed outputs respectively, i.e. symbols that are 
generated but kept hidden from the other systerrfj. The inclusion of undisclosed random 
variables requires extending the interaction model as follows. 



3. Undisclosed inputs, comino nly known as hypothese s or l atent variables in Bayesian statistics, are at the 
heart of Bayesian inference (iJavnes and B rctthorst. l2003l V In game theory, undisclosed outputs determine 
the player types. Player types are th e crucial component of a Bayes ian game whose purpose is to model 
games with incomplete information (|Osborne and Rubinsteinl 119991) . 
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The interaction between two I/O systems proceeds in cycles i = 1,2, . . . ,T. In each 
cycle, either one of the two systems generates a symbol xj conditioned on the previously 
observed symbols. The symbol xt might be either disclosed or undisclosed. A disclosed 
symbol is observed by both systems, while an undisclosed one is only observed by the 
system who generated it. After a symbol is generated, the I/O systems that have observed 
it update their belief states. 

To illustrate how uncertainty is modeled, consider the familiar Bayesian estimator. Let 
2? := 2?i X . . . X Djv be a set of strings, where each P„, 1 < n < A^, is a finite alphabet. A 
Bayesian estimator over T> with hypotheses O is a probability measure P over B x P of the 
form 

p{d<N) = Y,p{d<N\e)P{e), (6) 

where: d<N is an observation string with dn € ^n for all 1 < n < A; E B is a hypothesis; 
P{d<N\d) is the likelihood of d<,N under the hypothesis 6\ and P{9) is the prior probability 
of the hypothesis 0. The Bayesian estimator is an adaptive predictor: it uses the symbols 
observed in the past to predict the next symbol. The predictive distribution over the n-th 
observation (1 < ri < N) conditioned on the past observations (i<„ is then given by 

P{dn\d<n) = Y. ^(^"1^' d<n)P{B\d^n). (7) 

6»ee 

where P{dn\9-, d^^n) is the likelihood of d„ under hypothesis 9 given the past observations d<„ 
and P(0|d<„) is the posterior probability of 9 given the past observations d<n. Both of these 
quantities are obtained from P{9, d<j\f) by applying standard probability calculus. It is easy 
to see that this probabilistic model corresponds to an I/O system P over a sequence x<t 
where: T := N + 1; xi := 9 is an undisclosed input drawn from Xi := B; and Xt ■= dt-i 
(2 < t < r) is a disclosed input drawn from Xt := I^t-i- The probability measure P is 
constructed from P as 

P{9, d<N) := P{9)P{di)P{d2\di) ■ ■ ■ P(djvM<7v), 

where one has to notice that P ^ P because 9 is unobserved and thus cannot be used to 
condition , i.e. 

P{dn\9,d<n) = P{dn\d<n) = Y. P{dn\9' ,d<n)P{9') + P{dn\9,d<n)- 

6»'e0 

Hence, this illustrates two facts. First, undisclosed inputs play the role of hypotheses. 
Second, the model P and the I/O system P are in general not the same. 

Extending this scheme to include outputs as well is not straightforward. If some of the dn 
are generated by the system itself, then d?]) does not hold anymore, because outputs are 
syntactically different from inputs, requiring belief updates governed by causal constraints. 
Essentially, an input provides the system with information about the whole history of the 
stochastic process, while an output, by virtue of being generated hy the system itself as a 
function of the past, provides the system only with information about the pre sent and futur e 
of the stochastic process because the past cannot be changed. See for instance IShaferi (| 19961 ). 
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Pearj ( 200d ) . ISpirtes and ScheinesI ( 200ll ) and iDawidI ( 20ld ) for a more in-depth exposition 
of causality. 

In order carry out the behef updates fohowing outputs, it is necessary to know 
the causal probability model for P. The causal probability model consists of a set of 
conditional probability measures highlighting the functional dependencies amongst the 
random variables. This is reflected in the following definition. 
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Figure 3: The four types of random variables with respect to P. Solid arrows mean that 
the value of the random variable is disclosed, while dashed arrows mean that the 
value is undisclosed. The enclosed area contains the random variables that are 
observable by P. 



Definition 5. A causal model of an I/O system is a set of T conditional probability 
measures P{Xi), P{X2\Xi), . . . , P{Xt\X^t) over typed random variables Xi,X2, ■ ■ ■ ,Xt 
taking on values in finite alphabets Xi,X2, ■ ■ ■ , Xt ■ 

The causal model explains how the random variables functionally depend on each other. 
In particular, for all t > 1, the value of Xt is generated as a function of the values of 
The probability measure P over all the random variables is obtained by the 



Xi,. . . ,Xt-i 
product rule: 



P{Xi,...,XT):=\{P{Xt\X<t). 



t=i 



For notational convenience, we will use the letter P as a shorthand for the whole causal 
model. 

The type of a random variable specifies whether it an input or an output, and whether it 
is disclosed or undisclosed (Figure [3]). Both distinctions give rise to 2 x 2 = 4 possible types. 
If a random variable Xt is not an undisclosed input, then we say that it is observable. In 
this sense, being or not observable is not a type, but a property of the random variable. The 
operational significance of the type of random variables will become clear in the context of 
belief updates. 
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3.2.2 Belief updates 

When an I/O system observes the value xt G Xt of a random variable Xt, then its 
information state is updated. This update depends on whether Xt is an input or an output. 
If Xt is an input, then the update is logical. If Xt is an output, then the update is causal. 
This difference is illustrated in Figure HI 



Xi X2 X^ 
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Figure 4: A logical versus a causal update. The figure shows three causally ordered 
random variables Xi , X2 and X3 (taking on binary values) and their probabilities 
(through the height of their boxes). Two updates are compared: the logical 
update X2 = 1 and the causal update X2 -^ 1. These updates eliminate the 
incompatible probability mass (as shown in the first column after the update) 
and then normalize the remaining probability mass (second column after the 
update). Note that a logical update affects the probability mass of the whole 
history, eliminating the incompatible realizations; while a causal update affects 
only the probability mass of the present and the future. 



A logical update models a measurement. As such, it provides information about 
the whole realization of the stochastic process. That is, learning the value of Xt provides 
information about all {Xg '■ t < s <T} through the dependencies established by the causal 
model for P. A logical update Xt = xt changes all conditional probabilities as 



P{A\B) 



Xt=xt, 



^■ 



P{A\B,Xt 



Xt) 



12 



An axiomatic formalization of bounded rationality 



where A and B are arbitrary events. The plausibihty of observing a sequence xi,X2, ■ ■ ■ ,Xt 
(in this order) is given by 

P{xi)P{x2\xi)P{x3\xi,X2) ■ ■ ■ P{xt\xi, . . . , Xt^i) = P{x<t), 

where the last equahty follows from basic probability calculus. 

A causal update models a decision. As such, it only provides information about the 
future of the realization of the stochastic process, but not about its past. That is, learning 
the value of Xt provides information about {Xg : s > t,s a N} only. Furthermore, the 
random variable Xt is rendered independent from its past, thereby reflecting the autonomy 
of the decision. A causal update Xt ^ xt changes all conditional probabilities as 

P(^A\B) ^^^^ P{A\B,Xt^xt) = P'{A\B,Xt = xt), 

where A and B are arbitrary events and where P' is the probability measure uniquely 
defined by the equations 

i. P'{X^t) = P{X<t), (past) 

ii. P'{Xt\X^t) = S^,{Xt), (present) (8) 

iii. P'{Xt+l:T\X<t)=P{Xt+l:T\X<t). (futurc) 

When the random variable Xt is clear from the context, we use the abbreviation 

P{A\B,xt):=P{A\B,Xt^xt). 
The propensity of generating a sequence xi,X2, ■ ■ ■ ,Xt (in this order) is given by 

P{xi)P{x2\xi)P{x3\xi,X2) ■ ■ ■ P{xt\xi, ..., £t_i) = P{x<t), 

where the equality is obtained by using the definition of causal updates and then applying 
basic probability calculus. 

When an I/O system does not observe the value xt € Aj of a random variable Xt because 
it is an undisclosed input, then its information state is not updated. That is, an unobserved 
update Xt = xt leaves all conditional probabilities unchanged, i.e. 

P{A\B) ^'^^ P{A\B) , 

where A and B are arbitrary events. 

3.2.3 Deriving behavior from beliefs 

As anticipated previously, a model P of an I/O system P gives rise to a probability measure 
characterizing an I/O system. The probability measure P is derived from the causal model 
P as follows: 

Definition 6. Let P be a causal model of an I/O system. The associated I/O system 

P is the I/O system recursively defined as 

P(e) := 1, P(x<t) := P(x<i)P(xt|obs(x<i)), (9) 
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where the auxihary function obs(-) is given by 
obs(e) := e, obs(x<t) : = 



ohs{x^t)xt if Xt is an output, 
ohs{x<^t)xt if Xt is a disclosed input, 
obs(x<j) if Xt is an undisclosed input. 



In this definition, obs(x<i) selects the values that the I/O system has observed at time 
t + 1, flagging them as either causal or logical belief updates. By construction, P has the 
important property that for all x<t, 

P{xt\x^t) = -P(xt|obs(x<i)). 

3.3 The variational principle in I/O systems 

Let us assume that we are in possession of a reference I/O system Pq (or its causal model Pq) 
encoding our current knowledge. The problem is that we wish to convert Pq into an I/O 
system P maximizing a given target utility function U*. We assume further that Pq, P 
and U* share their random variables including the causal order and types. As we have 
argued previously, any transformation of a probability measure incurs into costs. These 
costs can potentially be so high that they jeopardize the benefits of naively maximizing 
the expectation of U*. We therefore seek an optimality principle that allows finding a 
probability measure P that trades off the benefits against the costs of this transformation. 
In accord with Section 12.31 we first not that the transformation of the reference I/O 
system Pq into P due to the addition of constraints U* can be expressed as a change in 
free utility characterized by Equation ^. The free utility functional for a given conjugate 
pair (P, U) can be expressed as follows 

J(P; U) := Y, P(^<t)U(x<t) - a J] P(x<t) log P(x<t). 

In Section [2. 31 we have also emphasized that there are two variational problems, namely the 
control and the estimation problem, that arise depending on the givens and the unknowns of 
the variation. Naturally, this distinction carries over in the case of probability distributions 
representing I/O systems. 

Suppose for simplicity that T = 1. Thus, we have to find an I/O system P over a single 
random variable X := Xi taking on values in Af := Afi. Again, we write down the difference 
in free utility, but identifying the givens with Pq and the unknowns with Pr. This yields 
the following two problems. 

1. Control. If we are searching for a probability law P that fulfills the constraints given 
by the maximization of U* and the minimization of the cost of the transformation 
Pq -^ P, then we use Equation (j3|), i.e. 

P = argmax> Pr(x)U*(a;) — a > Prfa;) log „ , , . 
Pr -'^^ '^-^ V y o Po(x) 



14 



An axiomatic formalization of bounded rationality 



2. Estimation. If we are searching for the best estimation P of the probabihty law Pq 
under the constraints U*, then we use Equation ([5]), i.e. 

P = argmm } Po(x) log . 

The same idea extends to the case where T > 1 , obtaining a functional for the difference 
in free utility that spans all the random variables. A simple way to do this is again by 
recursively defining two auxiliary probability measures G and R as 



J G(x<t)Pr(xj|2;<t) if X^ is controlled, 
G(e) := 1, G(x<t) ■= \ , 

I G(x<()Po(xi|a;<i) if X^ is estimated; 

Rl'N — i f>|' N _ f R-(3;«)Po(a;t|a;<i) if X^ is controlled, 

I R(x<i)Pr(xi|x<() if Xf is estimated. 



(10) 



Then, it is straightforward to see that the difference in free utility is given by 

) 



arg max < 

Pr 



Y^ G(x<T)U,(x<r) - a 5^ G(x<t) log ||^ \ . (11) 



4. Applications 



In the following, we will illustrate applications of the variational principle for I/O systems in 
Equation (|lip by deriving solutions to three problems: optimal control, adaptive estimation 
and adaptive control. 

Let A and O be two finite sets, the first being the set of actions and the second being 
the set of observations. Furthermore, let be a finite set called the set of parameters. 
The set Z := ^ x O is called the set of interactions, and a pair (a, o) G ^ is an 
interaction. We will underline symbols to glue them together as in ao_.i^i := aioi . . . atOt 
to abbreviate strings of interactions. Let P and Q be I/O systems. By convention, we 
will consider P the system to be designed and Q an external system to be interfaced. 
Accordingly, we call P the agent, and Q the environment. 

Consider the following interaction protocol. Initially, Q chooses a parameter € 
unbeknownst to P. Then, the interaction proceeds in cycles t = 1,2, . . . ,T. In cycle t, P 
randomly chooses a value at for the random variable At from the set of actions A conditioned 
on the past I/O symbols ao^j. Q responds by choosing a value ot for the random variable 
Ot from the set of observations O conditioned on the past I/O symbols Oao^jat. Then the 
next cycle starts. This interaction protocol determines a probability law over the causally 
ordered random variables 6,Ai,Oi, . . . , At, Ot defined as follows: 

6 ^(^{6), aj|6',ao<4 ~ P(ai|ao<J, ot\6 , qo^^at ^ Cl{ot\9 , qo<:t(^t) ■ 

Note that with respect to P, is a latent variable, Ai, . . . , At are outputs and Oi, . . . , Ot 
are observable inputs. Similarly for Q, 6,Oi, . . . ,0t are outputs, and Ai, . . . ,At are 
observable inputs. This interaction protocol, as known by P, is summarized in Table [H 
The applications in the following use this protocol or a simplification of it. 
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Table 1: The standard interaction protocol as seen by the agent P. 
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4.1 Optimal control 

In optimal control problems it is generally assumed that we are given a utility function U* 
and that the environment is fully known, i.e. Po{ot\ao^fat) = Q(of.\ao^jaf). The choice of 
the parameter can be omitted. The probability measures G and R are given by 



G{at\ao^-t) = Priat\ao<t): 
K{at\ao^t) = Poiat\ao<t): 



G{ot\ao^-tat) = Poiot\qo^tat) 
'R{ot\ao<t(^t) = Pr{ot\ao<tO't) 



Hence, the variational problem to find P is to maximize the functional 



E G(aO; 



<T) 



T 



t=\ 



a 



Y^H 



t=i 



-Po(ot|QO<fQt) 
Pr{ot\ao^tat) 



(12) 



which results from replacing G and R into (jll|) and by applying the equalities 



Pr{at\ao<t) = Pi"iat\ao<t), 
Poiat\ao<t) = Po{at\ao<t)^ 



Pr{ot\ao<tat) 
Po{ot\ao^tat) 



Pr{ot\ao<^tO-t), 
Po{ot\ao<tO't) 



which are easily derived using ([8D repeatedly. The important observation is that (|12p can be 
seen as a concise way of expressing a collection of independent variational problems, where 
this collection contains one variational problem for each random variable. In the variational 
problem for the observation probabilities we can disregard the constraint utilities and the 
resource cost of the action probabilities. The t-ih. summand of the total expected reward 
can then be written as 



E ^(^<tat) 



ao^j.at 



> Po Of ao<tQf) log p / t — 7 



o(ot\ao<t(^t) 



Since varying Pr(ot\ao^fat) does not influence the summands at times 7^ t, the optimal 
solution to this minimum relative entropy problem is trivially obtained by P{ot\ao^-i.at) = 
Q(of |ao^jaf). The variational problem with respect to the action probabilities is a little 
bit more intricate, since varying the first action probability, for example, has an impact 
on all subsequent conditional action probabilities. The functional (I12p can be expanded 
recursively, yielding 
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J^Priai 



U.(aO-alog-p-gi( + ^P(oi|ai) 



U.(oijai) 



+ y^^Pria^laOi] 



\Jt:{a2\aOi) — a log 



Pr(a2\ao-i) 
Po(a2|aoJ 



+ E^' 



02\ao^a2) 



U,(o2|aOj^a2) 



+ ^fV(aT|ao<j) 



U,(aT|ao,j,) -alog-— ■ — i^ + > B-(oT|ao yaT)U,(oT|ao,j,aT) 

-ro(aT|ao<T) ^^ 



The innermost variational problem is of the form 
y ^Pr{aT\ao^T 



aj' 



U*(ai|ao<y) + y^ P{oT\ao<T(^T)'^*{oT\ao<T(^T) - alog—- — ■ — ^ 
—" ^o[aT\ao<T) 



As discussed previously, its solution is 



Piat\ao<t) 



Po{at\ao 



<t. 



Z^'iao 



<t) 



PiaT\ao<T) = ^a(ao '^^ ^^P \ a^*i"'T\ao<T) + i^PioT\ao<TaT)'U*{oT\ao<TaT) >, 

where Z^^ao^j^) is the normalizing constant, also known as the partition function. Similarly, 
the action probabilities P{at\qo^^) can be obtained as 

exp< ^U*{at\ao^t) + ^ y^ Pjot |ao<f Qt)U» {ot \ao<tO-t) 

I Ot 

Ot ) 

where Z"(ao^f) are the normalizing constants obtained for the subsequent time step. This 
way the optimal action probabilities can be computed recursively. 

This result allows to recover the maximum expected utility solution, and more 
specifically, the dynamic programming solution. Identify the value function as ^"(ao<f) •= 
log Z°^(ao^j), and the instantaneous rewards as r(at\ao^j) := U*(a(|ao^J and r(of\ao^jat) : = 
\Jt:(ot\ao^fat). If one takes the limit a —^ 0, then PiatlQ^^t) ~^ ^a*io,t), where 

a* := niax <^ r{at\ao<^t) + '^P{ot\ao<t'^t) r{ot\qo^tat) + V°{qo^f-) > 

and where the value V^{ao^f) turns out to be given by the recursive formula 



^°(cto<t) = max < r{at\qo^f) + ^ P{ot\qo^tat) r(o(|ao^^ai) + V^{qo. 



i<t) 



Taking the limit a ^ oo puts all the emphasis of the variational problem on the resource 
costs. This case yields 

P{at\ao<t) = -Po(at|«o<J 
as expected. 



17 



Ortega & Braun 



4.2 Adaptive estimation 

In an adaptive estimation problem one is confronted with an unknown symbol source 
Po(ot|^,o<t) = Q(oi|6',6<i) indexed by G and chosen randomly as Po{0) = Q{0). 
For this observation problem we can disregard the action variables and set U* = 0. The 
probability measures G and R are given by 



0(9) = Po{0), 

K{e) = Pr{e), 

Replacing these distributions into (fTTj) yields 

T 



Giot\e,o<t) = Po{ot\o^t), 
Ii{ot\9,o<t) = Pr{ot\o<t). 



-a Y, Po{0)llPo{ot\e,o^t) 



For the parameter 9, we see that 



log -FTTTK + 2^ log 



Pr{e) 



t=l 



Po{ot\0,o<t) 
Priot\o^t) 



p{e) = Poie), 

and that the t-th summand of the functional can then be written as 

t-i 



-a Y, PoiO)llPo{ot\e,o<t) 



',0<T 



t=l 



-Po(ot^, o<^t) log ^ 

Pr{ot\o<t) 



The solution to this variational problem is well-known in the literature (JHaussler and Opperl . 
19971 : lOpperi . Il998l ) and is solved by the predictive distribution 



P(pt\o^t) = J]Po(^|o<t)^o(ot|e,o<t), 



where the posterior Po(^|o<t) is computed according to Bayes' rule. 

4.3 Adaptive control 

In adaptive control problems the environment is not known a priori, but known to belong 
to a set of possible environments Pf\(ot\Q, ao^jht) = Q(of\6,ad^jai) indexed hy 9 £ @ and 
chosen randomly as Po{6) := Q(^)- We have also seen that quantities that are estimated 
require the solution of a variational problem that is local in time — in contrast to quantities 
that are controlled, which require the solution of a variational problem that stretches over 
the whole future. Can we devise an adaptive controller that is based on pure estimation? 

If we also happen to know a set of controllers Pr\(at\d,do^j) for each of these 
environments (for instance, constructed previously by solving the individ ual optimal control 



proble ms), then a Bayesian rule for control can be devised — compare iQrtega and Braun 
( 2010l ). Since in pure estimation problems constraint utilities do not matter, we impose 
U=i, = for the sake of simplicity. The probability measures G and R are given by 

G{e) = Pq{0), G{at\9,ao^t) = -Po(at|6',ao<J, G{ot\0,ao^^at) = Poiot\0 , qo^^at) , 
R(6') = Pr{9), R{at\9,ao<t) = -f^(at|«o«)> R.{ot\9 , ao^-tat) = Pr{ot\ao<tO't)- 
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Inserting them into (jlip yields 
-a Y, Po{O)(t[Po{at\0,Qo<t)Po{ot\9,ao<tat))Y. 



t=i ' t=\ 



PQ{at\B ,ao^^) Po{ot\0,ao^tat) 

log rx./ I- V + log ■ 



Priat\ao<t) Pi'{ot\ao^t'^t) 



Again, for the parameter 9, we see that 

p{e) = Po{0). 

For the variational problem of the observation at time t we can again disregard the resource 
costs of the actions. Analogous to the solution for adaptive estimation, the variational 
problem is equivalent to 

-a ^ Pai9)iY\Pa{ar\9,ao<T)Po{oT\d,ao<r"-r)]Pa{at\0,ao<t) 

6,ao^^at ^r— 1 

V^ D ^ 1/1- 'M Po{ot\0,ao<t^t) 
X 2_^Pa{ot\e,ao ^tat) log- 



Priot\0,ao<tat) 
which is solved by the predictive distribution 

P(ot|ao<fOi) = y^^Po{0\ao<tO-t)Po{ot\d,ao<t^t)- 
e 

For actions, the procedure is identical. Thus, the variational problem for the t-th action is 
given by 



-a Y, ^oWfn^o(«-!^'^<-)^o('^-l^'^<-"-))E^o(ai|^,ao<Jlog§^^^[^^ 



t-i 

Pr(at\0,ao^i) 

again solved by the predictive distribution 

P{at\ao<t) = ^Po{G\ao<t)Po{at\0,ao<t)- 



This result has been previousl y reported as the Bayesian control rule ( Ortega and Braunl . 



20081 : iBraun and Ortegal . l20ld ). By sampling from the predictive distribution P{at+i\ao^^) 



the agent can solve adaptive control problems, such as bandit problems, adaptive linear 
quadratic control problems and Markov decision problems with unknown transition 
matrices. 

5. Discussion 

In this study we have used causal models to construct probability distributions representing 
I/O systems. As I/O systems both process input symbols and generate output symbols, 
their characterization requires both evidential and generative probabilities. The evidential 
probabilities ( "plausibilities" in the subjectivist sense of probability) allow the I/O system 
to infer properties about the input stream, while the generative probabilities ("propensities" 

19 



Ortega & Braun 



in the frequentist sense) prescribe the law to generate its output stream. The importance of 
distinguishing between input and output, more commonly known as the difference between 
seeing and doing, and their impa c t on inference, lies at the heart of statistical causality 
(IPearll . I2OO0I : ISpirtes and Scheinesl . l200lh . 

Based on the equivalence of information and utility, we have devised a variational 
principle to construct I/O systems. Structural simi larities between utilit ies and information 



have been previously reported in the literature ( Candeal et al.l . 1200 ll ). For the case of 



known envir onments, a dual ity between optimal control and estimation has been previously 
reported bv iTodorovl ( 20081 ). where an exponential transformation mediates between the 
cost-to- go function and a probability distribution that acts as a backwards filter. For 
the case of optimally learning systems in unknown enviro nments, a duality between 
utility and information has been reported bv iBelavkinI ( 20081 ). considering the problem of 
optimal learning as a variational problem of expected utility maximization with dynamical 
information constraints. An information-theoretic approach to i ntera c tive l earning based 
on principles from statistical physics has also been proposed by IStilll ( 2003 ). The use of 
the Kullback-Leibler divergence to measure deviations fror n a refere r ice di s tribut ion as a 
cost function for control ha s beeri previously proposed by iTodorovl ( 20061 . l2009l ) and by 
Kappen. Gomez, and Opperl ( 2003 ). In these studies, transition probabilities of Markov 



systems were manipulated directly and the cost measured as a probabilistic deviation with 
respect to the passive dynamics of the system. Adaptive con trollers based on the mini mum 
re lative entropy principle h ave been previously reported in lOrtega and BraunI (J2008l ) and 



m 



Braun and Ortegal ( 2010l ). The contribution of our study is to devise a single axiomatic 



framework that allows for the solution of both control and adaptation problems based on 
the equivalence of utility and information. This axiomatic framework leads to a single 
variational principle to solve both problems. The resulting controllers optimize a trade- 
off between maximization of a target utility function and resource costs and can hence be 
interpreted as bounded-rational actors. 

The idea of b ounded r ation ality through the consideration of information costs has been 
first proposed by ISimonI ( 19821 ). In game theory, information theory has been proposed to 
formalize bounded rational players whos e degree of rat ionality is given by a temperature 
parameter trading off entropy and payoff (jWolpertl . |2004| ) . The distinction between disclosed 
and undisclosed information has also been studied extensively in the lite rature ori game 
theo ry regarding problems of inc omplete or imperfect information (see iGibbona 1 19921 . 
and lOsborne and Rubinstein! 1 19991). Like these previous studies, our work has obvious 
connections to information theory (jShannonl . 1 19481 ) , thermodynamic s (see e.g. ICallenlll985l') 
and statistical inference (see e.g. the maximum entropy principles in lJaynes and Bretthorst 
[2003. ). 



6. Conclusions 

The main contribution of the current paper is to derive axiomatically a framework for 
bounded rationality. We propose to formalize agents as probability distributions over 
I/O streams. Based on the idea that a free system produces an outcome with higher 
probability if and only if it is more desirable, we postulate three simple axioms relating 
utilities and probabilities. We show that these axioms enforce a unique conversion law 
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between utility and probability (and thereby, information). Moreover, we show that this 
relation can be characterized as a variational principle: given a utility function, its conjugate 
probability measure maximizes the free utility functional. We exhibit how constrained 
transformations of probability measures can be characterized as a change in free utility and 
use this to formulate a model of bounded rationality. Accordingly, one obtains a variational 
principle to choose a probability measure that trades off the maximization of a target utility 
function and the cost of the deviation from a reference distribution. We show that optimal 
control, adaptive estimation and adaptive control problems can be solved this way in a 
resource-efficient way. When resource costs are ignored, the MEU principle is recovered. 
Our formalization might thus provide a principled approach to bounded rationality that 
establishes a link to information theory. 

References 

Anscombe, F. J., and Aumann, R. J. 1963. A Definition of Subjective Probability. The 
Annals of Mathematical Statistics 34(l):199-205. 

Belavkin, R. 2008. The duality of utility and information in optimally learning systems. In 
Proceedings of the 7th IEEE Conference on Cybernetics and Intelligent Systems, 1-6. 

Braun, D., and Ortega, P. 2010. A minimum relative entropy principle for adaptive control 
in linear quadratic regulators. In Proceedings of the 7th international conference on 
informatics in control, automation and robotics, (in press). 

Callen, H. 1985. Thermodynamics and an Introduction to Themo statistics. John Wiley &: 
Sons, 2nd edition. 

Candeal, J.; De Miguel, J.; Indurain, E.; and Mehta, G. 2001. Utility and entropy. Economic 
Theory 17:233-238. 

Dawid, A. 2010. Beware of the DAG! Journal of Machine Learning Research (to appear). 

Duff, M. O. 2002. Optimal learning: computational procedures for bay es- adaptive markov 
decision processes. Ph.D. Dissertation. Director-Andrew Barto. 

Gibbons, R. 1992. A Primer in Game Theory. Financial Times Prentice Hall. 

Haussler, D., and Opper, M. 1997. Mutual Information, Metric Entropy and Cumulative 
Relative Entropy Risk. The Annals of Statistics 25:2451-2492. 

Jaynes, E. T., and Bretthorst, L. G. 2003. Probability Theory: The Logic of Science: Books. 
Cambridge University Press. 

Kappen, B.; Gomez, V.; and Opper, M. 2009. Optimal control as a graphical model 
inference problem. arXiv:0901.0633. 

Keller, G. 1998. Equilibrium States in Ergodic Theory. London Mathematical Society 
Student Texts. Cambridge University Press. 

21 



Ortega & Braun 



Opper, M. 1998. A Bayesian Approach to Online Learning. Online Learning in Neural 
Networks 363-378. 

Ortega, P., and Braun, D. 2008. A minimum relative entropy principle for learning and 
acting. arXiv:0810.3605v3. 

Ortega, P., and Braun, D. 2010. A Bayesian rule for adaptive control based on causal 
interventions. In The third conference on artificial general intelligence, 121-126. Paris: 
Atlantis Press. 

Osborne, M. J., and Rubinstein, A. 1999. A Course in Game Theory. MIT Press. 

Pearl, J. 2000. Causality: Models, Reasoning, and Inference. Cambridge University Press, 
Cambridge, UK. 

Savage, L. J. 1954. The Foundations of Statistics. New York: John Wiley and Sons. 

Shafer, G. 1996. The art of causal conjecture. The MIT Press. 

Shannon, C. E. 1948. A mathematical theory of communication. Bell System Technical 
Journal 27:379-423 and 623-656. 

Simon, H. 1982. Models of Bounded Rationality. MIT Press. 

Spirtes, P., and Scheines, R. 2001. Causation, Prediction, and Search, Second Edition. MIT 
Press. 

Still, S. 2009. An information-theoretic approach to interactive learning. Europhysics 
Letters 85:28005. 

Todorov, E. 2006. Linearly solvable Markov decision problems. In Advances in Neural 
Information Processing Systems, volume 19, 1369-1376. 

Todorov, E. 2008. General duality between optimal control and estimation. In Proceedings 
of the 47th conference on decision and control, 4286-4292. 

Todorov, E. 2009. Efficient computation of optimal actions. Proceedings of the National 
Academy of Sciences U.S.A. 106:11478-11483. 

von Neumann, J., and Morgenstern, O. 1944. Theory of Cames and Economic Behavior. 
Princeton University Press. 

Wolpert, D. 2004. Information theory - the bridge connecting bounded rational game theory 
and statistical physics, (unpublished manuscript). 



22 



